Statistical Modelling of COVID-19 Outbreak in Italy

04 Apr 2020



Nonlinear growth models

Nonlinear growth models represent an instance of nonlinear regression models, a class of models taking the general form \[ y = \mu(x, \theta) + \epsilon, \] where \(\mu(x, \theta)\) is the mean function which depends on a possibly vector-valued parameter \(\theta\), and a possibly vector-valued predictor \(x\). The stochastic component \(\epsilon\) represents the error with mean zero and constant variance. Usually, a Gaussian distribution is also assumed for the error term.

By defining the mean function \(\mu(x, \theta)\) we may obtain several different models, all characterized by the fact that parameters \(\theta\) enter in a nonlinear way into the equation. Parameters are usually estimated by nonlinear least squares which aims at minimizing the residual sum of squares.

Exponential

\[ \mu(x) = \theta_1 \exp\{\theta_2 x\} \] where \(\theta_1\) is the value at the origin (i.e. \(\mu(x=0)\)), and \(\theta_2\) represents the (constant) relative ratio of change (i.e. \(\frac{d\mu(x)}{dx }\frac{1}{\mu(x)} = \theta_2\)). Thus, the model describes an increasing (exponential growth if \(\theta_2 > 0\)) or decreasing (exponential decay if \(\theta_2 < 0\)) trend with constant relative rate.

Logistic

\[ \mu(x) = \frac{\theta_1}{1+\exp\{(\theta_2 - x)/\theta_3\}} \] where \(\theta_1\) is the upper horizontal asymptote, \(\theta_2\) represents the x-value at the inflection point of the symmetric growth curve, and \(\theta_3\) represents a scale parameter (and \(1/\theta_3\) is the growth-rate parameter that controls how quickly the curve approaches the upper asymptote).

Gompertz

\[ \mu(x) = \theta_1 \exp\{-\theta_2 \theta_3^x\} \] where \(\theta_1\) is the horizontal asymptote, \(\theta_2\) represents the value of the function at \(x = 0\) (displacement along the x-axis), and \(\theta_3\) represents a scale parameter.

The difference between the logistic and Gompertz functions is that the latter is not symmetric around the inflection point.

Richards

\[ \mu(x) = \theta_1 (1 - \exp\{-\theta_2 x\})^{\theta_3} \] where \(\theta_1\) is the horizontal asymptote, \(\theta_2\) represents the rate of growth, and \(\theta_3\) in part determines the point of inflection on the y-axis.

Data

Dipartimento della Protezione Civile: COVID-19 Italia - Monitoraggio della situazione http://arcg.is/C1unv

Source: https://github.com/pcm-dpc/COVID-19

url = "https://raw.githubusercontent.com/pcm-dpc/COVID-19/master/dati-andamento-nazionale/dpc-covid19-ita-andamento-nazionale.csv"
COVID19 <- read.csv(file = url, stringsAsFactors = FALSE)
COVID19$data <- as.Date(COVID19$data)
# DT::datatable(COVID19)

Warnings

- 29/03/2020: dati Regione Emilia Romagna parziali (dato tampone non aggiornato).
- 26/03/2020: dati Regione Piemonte parziali (-50 deceduti - comunicazione tardiva)
- 18/03/2020: dati Regione Campania non pervenuti.
- 18/03/2020: dati Provincia di Parma non pervenuti.
- 17/03/2020: dati Provincia di Rimini non aggiornati
- 16/03/2020: dati P.A. Trento e Puglia non pervenuti.
- 11/03/2020: dati Regione Abruzzo non pervenuti.
- 10/03/2020: dati Regione Lombardia parziali.
- 07/03/2020: dati Brescia +300 esiti positivi


Modelling total infected

# create data for analysis
data = data.frame(date = COVID19$data,
                  y = COVID19$totale_casi)
data$x = as.numeric(data$date) - min(as.numeric(data$date)) + 1
DT::datatable(data, options = list("pageLength" = 5))

Estimation

Exponential

mod1_start = lm(log(y) ~ x, data = data)
b = unname(coef(mod1_start))
start = list(th1 = exp(b[1]), th2 = b[2])
exponential <- function(x, th1, th2) th1 * exp(th2 * x)
mod1 = nls(y ~ exponential(x, th1, th2), data = data, start = start)
summary(mod1)
## 
## Formula: y ~ exponential(x, th1, th2)
## 
## Parameters:
##       Estimate Std. Error t value      Pr(>|t|)    
## th1 5189.02339  701.57832   7.396 0.00000000616 ***
## th2    0.08082    0.00377  21.436       < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8341 on 39 degrees of freedom
## 
## Number of iterations to convergence: 10 
## Achieved convergence tolerance: 0.000002992

Logistic

mod2 = nls(y ~ SSlogis(x, Asym, xmid, scal), data = data)
summary(mod2)
## 
## Formula: y ~ SSlogis(x, Asym, xmid, scal)
## 
## Parameters:
##          Estimate   Std. Error t value Pr(>|t|)    
## Asym 140161.34368   1529.60099   91.63   <2e-16 ***
## xmid     30.12508      0.16551  182.01   <2e-16 ***
## scal      5.75980      0.08376   68.76   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1042 on 38 degrees of freedom
## 
## Number of iterations to convergence: 0 
## Achieved convergence tolerance: 0.00000494

Gompertz

mod3 = nls(y ~ SSgompertz(x, Asym, b2, b3), data = data)
# start = list(Asym = coef(mod2)[1])
# tmp = list(y = log(log(start$Asym) - log(data$y)), x = data$x)
# b = unname(coef(lm(y ~ x, data = tmp)))
# start = c(start, c(b2 = exp(b[1]), b3 = exp(b[2])))
# mod3 = nls(y ~ SSgompertz(x, Asym, b2, b3), data = data, start = start,
#            control = nls.control(maxiter = 1000))
summary(mod3)
## 
## Formula: y ~ SSgompertz(x, Asym, b2, b3)
## 
## Parameters:
##           Estimate    Std. Error t value Pr(>|t|)    
## Asym 201376.297313   5196.033400   38.76   <2e-16 ***
## b2        9.819141      0.299281   32.81   <2e-16 ***
## b3        0.928817      0.001721  539.59   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1030 on 38 degrees of freedom
## 
## Number of iterations to convergence: 0 
## Achieved convergence tolerance: 0.000001019

Richards

richards <- function(x, th1, th2, th3) th1*(1 - exp(-th2*x))^th3
Loss  <- function(th, y, x) sum((y - richards(x, th[1], th[2], th[3]))^2) 
start <- optim(par = c(coef(mod2)[1], 0.001, 1), fn = Loss, 
               y = data$y, x = data$x)$par
names(start) <- c("th1", "th2", "th3")
mod4 = nls(y ~ richards(x, th1, th2, th3), data = data, start = start,
           # trace = TRUE, algorithm = "plinear", 
           control = nls.control(maxiter = 1000, tol = 0.1))
# algorithm is not converging... 
summary(mod4)
## 
## Formula: y ~ richards(x, th1, th2, th3)
## 
## Parameters:
##          Estimate    Std. Error t value Pr(>|t|)    
## th1 226359.226131  11187.916989   20.23   <2e-16 ***
## th2      0.059768      0.003186   18.76   <2e-16 ***
## th3      6.531151      0.402329   16.23   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1317 on 38 degrees of freedom
## 
## Number of iterations to convergence: 5 
## Achieved convergence tolerance: 0.006092
# library(nlmrt)
# mod4 = nlxb(y ~ th1*(1 - exp(-th2*x))^th3, 
#             data = data, start = start, trace = TRUE)

Models comparison

models = list("Exponential model" = mod1, 
              "Logistic model" = mod2, 
              "Gompertz model" = mod3,
              "Richards model" = mod4)
tab = data.frame(loglik = sapply(models, logLik),
                 df = sapply(models, function(m) attr(logLik(m), "df")),
                 Rsquare = sapply(models, function(m) 
                                  cor(data$y, fitted(m))^2),
                 AIC = sapply(models, AIC),
                 AICc = sapply(models, AICc),
                 BIC = sapply(models, BIC))
sel <- apply(tab[,4:6], 2, which.min)
tab$"" <- sapply(tabulate(sel, nbins = length(models))+1, symnum,
                 cutpoints = 0:4, symbols = c("", "*", "**", "***"))
knitr::kable(tab)
loglik df Rsquare AIC AICc BIC
Exponential model -427.3385 3 0.9670306 860.6771 861.3257 865.8178
Logistic model -341.5240 4 0.9994906 691.0480 692.1591 697.9023
Gompertz model -341.0502 4 0.9994939 690.1005 691.2116 696.9548 ***
Richards model -351.1288 4 0.9992209 710.2576 711.3687 717.1118
ggplot(data, aes(x = date, y = y)) + 
  geom_point() +
  geom_line(aes(y = fitted(mod1), color = "Exponential")) +
  geom_line(aes(y = fitted(mod2), color = "Logistic")) +
  geom_line(aes(y = fitted(mod3), color = "Gompertz")) +
  geom_line(aes(y = fitted(mod4), color = "Richards")) +
  labs(x = "", y = "Infected", color = "Model") +
  scale_color_manual(values = cols) +
  scale_y_continuous(breaks = seq(0, coef(mod2)[1], by = 5000),
                     minor_breaks = seq(0, coef(mod2)[1], by = 1000)) +
  scale_x_date(date_breaks = "2 day", date_labels =  "%b%d",
               minor_breaks = "1 day") +
  theme_bw() +
  theme(legend.position = "top")

last_plot() +
  scale_y_continuous(trans = "log10", limits = c(100,NA)) +
  labs(y = "Infected (log10 scale)")

Predictions

Point estimates

df = data.frame(x = seq(min(data$x), max(data$x)+14))
df = cbind(df, date = as.Date(df$x, origin = data$date[1]-1),
               fit1 = predict(mod1, newdata = df),
               fit2 = predict(mod2, newdata = df),
               fit3 = predict(mod3, newdata = df),
               fit4 = predict(mod4, newdata = df))
ylim = c(0, max(df[,c("fit2", "fit3")]))
ggplot(data, aes(x = date, y = y)) + 
  geom_point() +
  geom_line(data = df, aes(x = date, y = fit1, color = "Exponential")) +
  geom_line(data = df, aes(x = date, y = fit2, color = "Logistic")) +
  geom_line(data = df, aes(x = date, y = fit3, color = "Gompertz")) +
  geom_line(data = df, aes(x = date, y = fit4, color = "Richards")) +
  coord_cartesian(ylim = ylim) +
  labs(x = "", y = "Infected", color = "Model") +
  scale_y_continuous(breaks = seq(0, max(ylim), by = 10000),
                     minor_breaks = seq(0, max(ylim), by = 5000)) +
  scale_x_date(date_breaks = "2 day", date_labels =  "%b%d",
               minor_breaks = "1 day") +
  scale_color_manual(values = cols) +
  theme_bw() +
  theme(legend.position = "top",
        axis.text.x = element_text(angle=60, hjust=1))

Prediction intervals

# compute prediction using Moving Block Bootstrap (MBB) for nls
df = data.frame(x = seq(min(data$x), max(data$x)+14))
df = cbind(df, date = as.Date(df$x, origin = data$date[1]-1))

pred1 = cbind(df, "fit" = predict(mod1, newdata = df))
pred1[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod1, df[df$x > max(data$x),])[,2:3]

pred2 = cbind(df, "fit" = predict(mod2, newdata = df))
pred2[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod2, df[df$x > max(data$x),])[,2:3]

pred3 = cbind(df, "fit" = predict(mod3, newdata = df))
pred3[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod3, df[df$x > max(data$x),])[,2:3]

pred4 = cbind(df, "fit" = predict(mod4, newdata = df))
pred4[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod4, df[df$x > max(data$x),])[,2:3]

# predictions for next day
pred = rbind(subset(pred1, x == max(data$x)+1, select = 2:5),
             subset(pred2, x == max(data$x)+1, select = 2:5),
             subset(pred3, x == max(data$x)+1, select = 2:5),
             subset(pred4, x == max(data$x)+1, select = 2:5))
print(pred, digits = 3)
##           date    fit    lwr    upr
## 42  2020-04-05 154627 131267 179795
## 421 2020-04-05 124340 121803 126392
## 422 2020-04-05 129471 126627 132503
## 423 2020-04-05 130150 126662 134278

ylim = c(0, max(pred2$upr, pred3$upr, na.rm=TRUE))
ggplot(data, aes(x = date, y = y)) + 
  geom_point() +
  geom_line(data = pred1, aes(x = date, y = fit, color = "Exponential")) +
  geom_line(data = pred2, aes(x = date, y = fit, color = "Logistic")) +
  geom_line(data = pred3, aes(x = date, y = fit, color = "Gompertz")) +
  geom_line(data = pred4, aes(x = date, y = fit, color = "Richards")) +
  geom_ribbon(data = pred1, aes(x = date, ymin = lwr, ymax = upr), 
              inherit.aes = FALSE, fill = cols[1], alpha=0.3) +
  geom_ribbon(data = pred2, aes(x = date, ymin = lwr, ymax = upr), 
              inherit.aes = FALSE, fill = cols[2], alpha=0.3) +
  geom_ribbon(data = pred3, aes(x = date, ymin = lwr, ymax = upr),
              inherit.aes = FALSE, fill = cols[3], alpha=0.3) +
  geom_ribbon(data = pred4, aes(x = date, ymin = lwr, ymax = upr),
              inherit.aes = FALSE, fill = cols[4], alpha=0.3) +
  coord_cartesian(ylim = c(0, max(ylim))) +
  labs(x = "", y = "Infected", color = "Model") +
  scale_y_continuous(minor_breaks = seq(0, max(ylim), by = 10000)) +
  scale_x_date(date_breaks = "2 day", date_labels =  "%b%d",
               minor_breaks = "1 day") +
  scale_color_manual(values = cols) +
  theme_bw() +
  theme(legend.position = "top",
        axis.text.x = element_text(angle=60, hjust=1))

Modelling total deceased

# create data for analysis
data = data.frame(date = COVID19$data,
                  y = COVID19$deceduti)
data$x = as.numeric(data$date) - min(as.numeric(data$date)) + 1
DT::datatable(data, options = list("pageLength" = 5))

Estimation

Exponential

mod1_start = lm(log(y) ~ x, data = data)
b = unname(coef(mod1_start))
start = list(th1 = exp(b[1]), th2 = b[2])
exponential <- function(x, th1, th2) th1 * exp(th2 * x)
mod1 = nls(y ~ exponential(x, th1, th2), data = data, start = start)
summary(mod1)
## 
## Formula: y ~ exponential(x, th1, th2)
## 
## Parameters:
##       Estimate Std. Error t value     Pr(>|t|)    
## th1 324.775232  46.040888   7.054 0.0000000181 ***
## th2   0.096968   0.003863  25.101      < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 810.4 on 39 degrees of freedom
## 
## Number of iterations to convergence: 10 
## Achieved convergence tolerance: 0.000002861

Logistic

mod2 = nls(y ~ SSlogis(x, Asym, xmid, scal), data = data)
summary(mod2)
## 
## Formula: y ~ SSlogis(x, Asym, xmid, scal)
## 
## Parameters:
##        Estimate Std. Error t value Pr(>|t|)    
## Asym 18744.0342   298.0491   62.89   <2e-16 ***
## xmid    33.1924     0.2112  157.17   <2e-16 ***
## scal     5.4434     0.0906   60.08   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 131.9 on 38 degrees of freedom
## 
## Number of iterations to convergence: 0 
## Achieved convergence tolerance: 0.000003299

Gompertz

mod3 = nls(y ~ SSgompertz(x, Asym, b2, b3), data = data)
# manually set starting values
# start = list(Asym = coef(mod2)[1])
# tmp = list(y = log(log(start$Asym) - log(data$y)), x = data$x)
# b = unname(coef(lm(y ~ x, data = tmp)))
# start = c(start, c(b2 = exp(b[1]), b3 = exp(b[2])))
# mod3 = nls(y ~ SSgompertz(x, Asym, b2, b3), data = data, start = start, 
#            control = nls.control(maxiter = 10000))
summary(mod3)
## 
## Formula: y ~ SSgompertz(x, Asym, b2, b3)
## 
## Parameters:
##          Estimate   Std. Error t value Pr(>|t|)    
## Asym 31726.479610   803.038485   39.51   <2e-16 ***
## b2      12.379937     0.287557   43.05   <2e-16 ***
## b3       0.932859     0.001243  750.23   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 77.36 on 38 degrees of freedom
## 
## Number of iterations to convergence: 0 
## Achieved convergence tolerance: 0.00000204

Richards

richards <- function(x, th1, th2, th3) th1*(1 - exp(-th2*x))^th3
Loss  <- function(th, y, x) sum((y - richards(x, th[1], th[2], th[3]))^2) 
start <- optim(par = c(coef(mod2)[1], 0.001, 1), fn = Loss, 
               y = data$y, x = data$x)$par
names(start) <- c("th1", "th2", "th3")
mod4 = nls(y ~ richards(x, th1, th2, th3), data = data, start = start,
           # trace = TRUE, algorithm = "port", 
           control = nls.control(maxiter = 1000))
summary(mod4)
## 
## Formula: y ~ richards(x, th1, th2, th3)
## 
## Parameters:
##         Estimate   Std. Error t value Pr(>|t|)    
## th1 37046.637958  1781.387400   20.80   <2e-16 ***
## th2     0.056208     0.002259   24.89   <2e-16 ***
## th3     8.262838     0.381925   21.64   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 97.99 on 38 degrees of freedom
## 
## Number of iterations to convergence: 2 
## Achieved convergence tolerance: 0.000008786

Models comparison

models = list("Exponential model" = mod1, 
              "Logistic model" = mod2, 
              "Gompertz model" = mod3,
              "Richards model" = mod4)
tab = data.frame(loglik = sapply(models, logLik),
                 df = sapply(models, function(m) attr(logLik(m), "df")),
                 Rsquare = sapply(models, function(m) 
                                  cor(data$y, fitted(m))^2),
                 AIC = sapply(models, AIC),
                 AICc = sapply(models, AICc),
                 BIC = sapply(models, BIC))
sel <- apply(tab[,4:6], 2, which.min)
tab$"" <- sapply(tabulate(sel, nbins = length(models))+1, symnum,
                 cutpoints = 0:4, symbols = c("", "*", "**", "***"))
knitr::kable(tab)
loglik df Rsquare AIC AICc BIC
Exponential model -331.7498 3 0.9788341 669.4996 670.1483 674.6404
Logistic model -256.7708 4 0.9994634 521.5417 522.6528 528.3959
Gompertz model -234.9072 4 0.9997831 477.8145 478.9256 484.6688 ***
Richards model -244.5968 4 0.9996685 497.1937 498.3048 504.0480
ggplot(data, aes(x = date, y = y)) + 
  geom_point() +
  geom_line(aes(y = fitted(mod1), color = "Exponential")) +
  geom_line(aes(y = fitted(mod2), color = "Logistic")) +
  geom_line(aes(y = fitted(mod3), color = "Gompertz")) +
  geom_line(aes(y = fitted(mod4), color = "Richards")) +
  labs(x = "", y = "Deceased", color = "Model") +
  scale_color_manual(values = cols) +
  scale_y_continuous(breaks = seq(0, coef(mod2)[1], by = 500),
                     minor_breaks = seq(0, coef(mod2)[1], by = 100)) +
  scale_x_date(date_breaks = "2 day", date_labels =  "%b%d",
               minor_breaks = "1 day") +
  theme_bw() +
  theme(legend.position = "top")

last_plot() +
  scale_y_continuous(trans = "log10", limits = c(10,NA)) +
  labs(y = "Deceased (log10 scale)")

Predictions

Point estimates

df = data.frame(x = seq(min(data$x), max(data$x)+14))
df = cbind(df, date = as.Date(df$x, origin = data$date[1]-1),
               fit1 = predict(mod1, newdata = df),
               fit2 = predict(mod2, newdata = df),
               fit3 = predict(mod3, newdata = df),
               fit4 = predict(mod4, newdata = df))
ylim = c(0, max(df[,-(1:3)]))
ggplot(data, aes(x = date, y = y)) + 
  geom_point() +
  geom_line(data = df, aes(x = date, y = fit1, color = "Exponential")) +
  geom_line(data = df, aes(x = date, y = fit2, color = "Logistic")) +
  geom_line(data = df, aes(x = date, y = fit3, color = "Gompertz")) +
  geom_line(data = df, aes(x = date, y = fit4, color = "Richards")) +
  coord_cartesian(ylim = ylim) +
  labs(x = "", y = "Deceased", color = "Model") +
  scale_y_continuous(breaks = seq(0, max(ylim), by = 1000),
                     minor_breaks = seq(0, max(ylim), by = 1000)) +
  scale_x_date(date_breaks = "2 day", date_labels =  "%b%d",
               minor_breaks = "1 day") +
  scale_color_manual(values = cols) +
  theme_bw() +
  theme(legend.position = "top",
        axis.text.x = element_text(angle=60, hjust=1))

Prediction intervals

# compute prediction using Moving Block Bootstrap (MBB) for nls
df = data.frame(x = seq(min(data$x), max(data$x)+14))
df = cbind(df, date = as.Date(df$x, origin = data$date[1]-1))

pred1 = cbind(df, "fit" = predict(mod1, newdata = df))
pred1[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod1, df[df$x > max(data$x),])[,2:3]

pred2 = cbind(df, "fit" = predict(mod2, newdata = df))
pred2[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod2, df[df$x > max(data$x),])[,2:3]

pred3 = cbind(df, "fit" = predict(mod3, newdata = df))
pred3[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod3, df[df$x > max(data$x),])[,2:3]

pred4 = cbind(df, "fit" = predict(mod4, newdata = df))
pred4[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod4, df[df$x > max(data$x),])[,2:3]

# predictions for next day
pred = rbind(subset(pred1, x == max(data$x)+1, select = 2:5),
             subset(pred2, x == max(data$x)+1, select = 2:5),
             subset(pred3, x == max(data$x)+1, select = 2:5),
             subset(pred4, x == max(data$x)+1, select = 2:5))
print(pred, digits = 3)
##           date   fit   lwr   upr
## 42  2020-04-05 19068 16736 21711
## 421 2020-04-05 15642 15203 15951
## 422 2020-04-05 16262 16032 16476
## 423 2020-04-05 16335 16067 16644

ylim = c(0, max(pred2$upr, pred3$upr, na.rm=TRUE))
ggplot(data, aes(x = date, y = y)) + 
  geom_point() +
  geom_line(data = pred1, aes(x = date, y = fit, color = "Exponential")) +
  geom_line(data = pred2, aes(x = date, y = fit, color = "Logistic")) +
  geom_line(data = pred3, aes(x = date, y = fit, color = "Gompertz")) +
  geom_line(data = pred4, aes(x = date, y = fit, color = "Richards")) +
  geom_ribbon(data = pred1, aes(x = date, ymin = lwr, ymax = upr), 
              inherit.aes = FALSE, fill = cols[1], alpha=0.3) +
  geom_ribbon(data = pred2, aes(x = date, ymin = lwr, ymax = upr), 
              inherit.aes = FALSE, fill = cols[2], alpha=0.3) +
  geom_ribbon(data = pred3, aes(x = date, ymin = lwr, ymax = upr),
              inherit.aes = FALSE, fill = cols[3], alpha=0.3) +
  geom_ribbon(data = pred4, aes(x = date, ymin = lwr, ymax = upr),
              inherit.aes = FALSE, fill = cols[4], alpha=0.3) +
  coord_cartesian(ylim = c(0, max(ylim))) +
  labs(x = "", y = "Deceased", color = "Model") +
  scale_y_continuous(minor_breaks = seq(0, max(ylim), by = 1000)) +
  scale_x_date(date_breaks = "2 day", date_labels =  "%b%d",
               minor_breaks = "1 day") +
  scale_color_manual(values = cols) +
  theme_bw() +
  theme(legend.position = "top",
        axis.text.x = element_text(angle=60, hjust=1))

Modelling recovered

# create data for analysis
data = data.frame(date = COVID19$data,
                  y = COVID19$dimessi_guariti)
data$x = as.numeric(data$date) - min(as.numeric(data$date)) + 1
DT::datatable(data, options = list("pageLength" = 5))

Estimation

Exponential

mod1_start = lm(log(y) ~ x, data = data)
b = unname(coef(mod1_start))
start = list(th1 = exp(b[1]), th2 = b[2])
exponential <- function(x, th1, th2) th1 * exp(th2 * x)
mod1 = nls(y ~ exponential(x, th1, th2), data = data, start = start)
summary(mod1)
## 
## Formula: y ~ exponential(x, th1, th2)
## 
## Parameters:
##       Estimate Std. Error t value        Pr(>|t|)    
## th1 369.410857  40.987759   9.013 0.0000000000446 ***
## th2   0.100442   0.003011  33.356         < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 788.9 on 39 degrees of freedom
## 
## Number of iterations to convergence: 10 
## Achieved convergence tolerance: 0.000005522

Logistic

mod2 = nls(y ~ SSlogis(x, Asym, xmid, scal), data = data)
summary(mod2)
## 
## Formula: y ~ SSlogis(x, Asym, xmid, scal)
## 
## Parameters:
##        Estimate Std. Error t value Pr(>|t|)    
## Asym 30102.6923  1190.8192   25.28   <2e-16 ***
## xmid    36.2058     0.5321   68.04   <2e-16 ***
## scal     6.2476     0.1694   36.89   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 268.9 on 38 degrees of freedom
## 
## Number of iterations to convergence: 0 
## Achieved convergence tolerance: 0.000009217

Gompertz

mod3 = nls(y ~ SSgompertz(x, Asym, b2, b3), data = data)
summary(mod3)
## 
## Formula: y ~ SSgompertz(x, Asym, b2, b3)
## 
## Parameters:
##         Estimate  Std. Error t value Pr(>|t|)    
## Asym 75911.43526  5882.03133   12.91 1.83e-15 ***
## b2       9.68614     0.22633   42.80  < 2e-16 ***
## b3       0.95201     0.00184  517.36  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 161.3 on 38 degrees of freedom
## 
## Number of iterations to convergence: 1 
## Achieved convergence tolerance: 0.000003058

Richards

richards <- function(x, th1, th2, th3) th1*(1 - exp(-th2*x))^th3
Loss  <- function(th, y, x) sum((y - richards(x, th[1], th[2], th[3]))^2) 
start <- optim(par = c(coef(mod2)[1], 0.001, 1), fn = Loss, 
               y = data$y, x = data$x)$par
names(start) <- c("th1", "th2", "th3")
mod4 = nls(y ~ richards(x, th1, th2, th3), data = data, start = start,
           # trace = TRUE, # algorithm = "port", 
           control = nls.control(maxiter = 1000))
summary(mod4)
## 
## Formula: y ~ richards(x, th1, th2, th3)
## 
## Parameters:
##          Estimate    Std. Error t value       Pr(>|t|)    
## th1 159462.278023  34432.731207   4.631 0.000041725787 ***
## th2      0.026359      0.003213   8.203 0.000000000617 ***
## th3      4.891582      0.289208  16.914        < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 165.6 on 38 degrees of freedom
## 
## Number of iterations to convergence: 15 
## Achieved convergence tolerance: 0.000007097

Models comparison

models = list("Exponential model" = mod1, 
              "Logistic model" = mod2, 
              "Gompertz model" = mod3,
              "Richards model" = mod4)
tab = data.frame(loglik = sapply(models, logLik),
                 df = sapply(models, function(m) attr(logLik(m), "df")),
                 Rsquare = sapply(models, function(m) 
                                  cor(data$y, fitted(m))^2),
                 AIC = sapply(models, AIC),
                 AICc = sapply(models, AICc),
                 BIC = sapply(models, BIC))
sel <- apply(tab[,4:6], 2, which.min)
tab$"" <- sapply(tabulate(sel, nbins = length(models))+1, symnum,
                 cutpoints = 0:4, symbols = c("", "*", "**", "***"))
knitr::kable(tab)
loglik df Rsquare AIC AICc BIC
Exponential model -330.6474 3 0.9882744 667.2948 667.9435 672.4356
Logistic model -285.9941 4 0.9985015 579.9883 581.0994 586.8426
Gompertz model -265.0368 4 0.9993977 538.0735 539.1846 544.9278 ***
Richards model -266.1188 4 0.9993867 540.2375 541.3486 547.0918
ggplot(data, aes(x = date, y = y)) + 
  geom_point() +
  geom_line(aes(y = fitted(mod1), color = "Exponential")) +
  geom_line(aes(y = fitted(mod2), color = "Logistic")) +
  geom_line(aes(y = fitted(mod3), color = "Gompertz")) +
  geom_line(aes(y = fitted(mod4), color = "Richards")) +
  labs(x = "", y = "Recovered", color = "Model") +
  scale_color_manual(values = cols) +
  scale_y_continuous(breaks = seq(0, coef(mod2)[1], by = 500),
                     minor_breaks = seq(0, coef(mod2)[1], by = 100)) +
  scale_x_date(date_breaks = "2 day", date_labels =  "%b%d",
               minor_breaks = "1 day") +
  theme_bw() +
  theme(legend.position = "top")

last_plot() +
  scale_y_continuous(trans = "log10", limits = c(10,NA)) +
  labs(y = "Recovered (log10 scale)")

Predictions

Point estimates

df = data.frame(x = seq(min(data$x), max(data$x)+14))
df = cbind(df, date = as.Date(df$x, origin = data$date[1]-1),
               fit1 = predict(mod1, newdata = df),
               fit2 = predict(mod2, newdata = df),
               fit3 = predict(mod3, newdata = df),
               fit4 = predict(mod4, newdata = df))
ylim = c(0, max(df[,-(1:3)]))
ggplot(data, aes(x = date, y = y)) + 
  geom_point() + 
  geom_line(data = df, aes(x = date, y = fit1, color = "Exponential")) +
  geom_line(data = df, aes(x = date, y = fit2, color = "Logistic")) +
  geom_line(data = df, aes(x = date, y = fit3, color = "Gompertz")) +
  geom_line(data = df, aes(x = date, y = fit4, color = "Richards")) +
  coord_cartesian(ylim = ylim) +
  labs(x = "", y = "Recovered", color = "Model") +
  scale_y_continuous(breaks = seq(0, max(ylim), by = 1000),
                     minor_breaks = seq(0, max(ylim), by = 1000)) +
  scale_x_date(date_breaks = "2 day", date_labels =  "%b%d",
               minor_breaks = "1 day") +
  scale_color_manual(values = cols) +
  theme_bw() +
  theme(legend.position = "top",
        axis.text.x = element_text(angle=60, hjust=1))

Prediction intervals

# compute prediction using Moving Block Bootstrap (MBB) for nls
df = data.frame(x = seq(min(data$x), max(data$x)+14))
df = cbind(df, date = as.Date(df$x, origin = data$date[1]-1))

pred1 = cbind(df, "fit" = predict(mod1, newdata = df))
pred1[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod1, df[df$x > max(data$x),])[,2:3]

pred2 = cbind(df, "fit" = predict(mod2, newdata = df))
pred2[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod2, df[df$x > max(data$x),])[,2:3]

pred3 = cbind(df, "fit" = predict(mod3, newdata = df))
pred3[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod3, df[df$x > max(data$x),])[,2:3]

pred4 = cbind(df, "fit" = predict(mod4, newdata = df))
pred4[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod4, df[df$x > max(data$x),])[,2:3]

# predictions for next day
pred = rbind(subset(pred1, x == max(data$x)+1, select = 2:5),
             subset(pred2, x == max(data$x)+1, select = 2:5),
             subset(pred3, x == max(data$x)+1, select = 2:5),
             subset(pred4, x == max(data$x)+1, select = 2:5))
print(pred, digits = 3)
##           date   fit   lwr   upr
## 42  2020-04-05 25096 22823 27714
## 421 2020-04-05 21570 20782 22315
## 422 2020-04-05 22237 21829 22712
## 423 2020-04-05 22399 21965 22918

ylim = c(0, max(pred2$upr, pred3$upr, na.rm=TRUE))
ggplot(data, aes(x = date, y = y)) + 
  geom_point() +
  geom_line(data = pred1, aes(x = date, y = fit, color = "Exponential")) +
  geom_line(data = pred2, aes(x = date, y = fit, color = "Logistic")) +
  geom_line(data = pred3, aes(x = date, y = fit, color = "Gompertz")) +
  geom_line(data = pred4, aes(x = date, y = fit, color = "Richards")) +
  geom_ribbon(data = pred1, aes(x = date, ymin = lwr, ymax = upr), 
              inherit.aes = FALSE, fill = cols[1], alpha=0.3) +
  geom_ribbon(data = pred2, aes(x = date, ymin = lwr, ymax = upr), 
              inherit.aes = FALSE, fill = cols[2], alpha=0.3) +
  geom_ribbon(data = pred3, aes(x = date, ymin = lwr, ymax = upr),
              inherit.aes = FALSE, fill = cols[3], alpha=0.3) +
  geom_ribbon(data = pred4, aes(x = date, ymin = lwr, ymax = upr),
              inherit.aes = FALSE, fill = cols[4], alpha=0.3) +
  coord_cartesian(ylim = c(0, max(ylim))) +
  labs(x = "", y = "Recovered", color = "Model") +
  scale_y_continuous(breaks = seq(0, max(ylim), by = 5000),
                     minor_breaks = seq(0, max(ylim), by = 1000)) +
  scale_x_date(date_breaks = "2 day", date_labels =  "%b%d",
               minor_breaks = "1 day") +
  scale_color_manual(values = cols) +
  theme_bw() +
  theme(legend.position = "top",
        axis.text.x = element_text(angle=60, hjust=1))

Description of evolution

Positive cases and administered swabs

df = data.frame(date = COVID19$data,
                positives = c(NA, diff(COVID19$totale_casi)),
                swabs = c(NA, diff(COVID19$tamponi)))
df$x = as.numeric(df$date) - min(as.numeric(df$date)) + 1
# df$y = df$positives/df$swabs
df$y = df$positives/c(NA, zoo::rollmean(df$swabs, 2))
df = subset(df, swabs > 50)
# DT::datatable(df[,-4], )
ggplot(df, aes(x = date)) + 
  geom_point(aes(y = swabs, color = "swabs"), pch = 19) +
  geom_line(aes(y = swabs, color = "swabs")) +
  geom_point(aes(y = positives, color = "positives"), pch = 0) +
  geom_line(aes(y = positives, color = "positives")) +
  labs(x = "", y = "Number of cases", color = "") +
  scale_x_date(date_breaks = "2 day", date_labels =  "%b%d",
               minor_breaks = "1 day") +
  scale_color_manual(values = palette()[c(2,1)]) +
  theme_bw() +
  theme(legend.position = "top",
        axis.text.x = element_text(angle=60, hjust=1))

ggplot(df, aes(x = date, y = y)) + 
  geom_smooth(method = "loess", se = TRUE, col = "black") +
  geom_point(col=palette()[4]) + 
  geom_line(size = 0.5, col=palette()[4]) +
  labs(x = "", y = "% positives among admnistered swabs (two-day rolling mean)") +
  scale_y_continuous(labels = scales::percent_format(),
                     breaks = seq(0, 0.5, by = 0.05),
                     limits = c(0,NA)) +
  scale_x_date(date_breaks = "2 day", date_labels =  "%b%d",
               minor_breaks = "1 day") +
  theme_bw() +
  theme(legend.position = "top",
        axis.text.x = element_text(angle=60, hjust=1))

Hospitalized and ICU patients

df = data.frame(date = COVID19$data,
                hospital = c(NA, diff(COVID19$totale_ospedalizzati)),
                icu = c(NA, diff(COVID19$terapia_intensiva)))
df$x = as.numeric(df$date) - min(as.numeric(df$date)) + 1
ggplot(df, aes(x = date, y = hospital)) + 
  geom_smooth(method = "loess", se = TRUE, col = "black") +
  geom_point(col = "orange") + 
  geom_line(size = 0.5, col = "orange") +
  labs(x = "", y = "Change hospitalized patients") +
  coord_cartesian(ylim = range(df$hospital, na.rm = TRUE)) +
  scale_y_continuous(minor_breaks = seq(0, max(df$hospital, na.rm = TRUE), 
                                        by = 100)) +
  scale_x_date(date_breaks = "2 day", date_labels =  "%b%d",
               minor_breaks = "1 day") +
  theme_bw() +
  theme(legend.position = "top",
        axis.text.x = element_text(angle=60, hjust=1))

ggplot(df, aes(x = date, y = icu)) + 
  geom_smooth(method = "loess", se = TRUE, col = "black") +
  geom_point(col = "red2") + 
  geom_line(size = 0.5, col = "red2") +
  labs(x = "", y = "Change ICU patients") +
  coord_cartesian(ylim = range(df$icu, na.rm = TRUE)) +
  scale_y_continuous(minor_breaks = seq(0, max(df$icu, na.rm = TRUE), 
                                        by = 10)) +
  scale_x_date(date_breaks = "2 day", date_labels =  "%b%d",
               minor_breaks = "1 day") +
  theme_bw() +
  theme(legend.position = "top",
        axis.text.x = element_text(angle=60, hjust=1))